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Proteome of SARS-coronavirus, an Early Split-off 
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‘The genome organization and expression strategy of the newly identified 
severe acute respiratory syndrome coronavirus (SARS-CoV) were pre- 
dicted using recently published genome sequences. Fourteen putative 
‘open reading frames were identified, 12 of which were predicted to be 
expressed from a nested set of eight subgenomic mRNAs. The synthesis 
fof these mRNAS in SARS-CoV-infected cells was confirmed experimen- 
tally. The 4382- and 7073 amino acid residue SARS-CoV replicase poly- 
proteins are predicted to be cleaved into 16 subunits by two. viral 
proteinases (bringing the total number of SARS-CoV’ proteins to 28). A 
phylogenetic analysis of the replicase gene, using a distantly related toro- 
Virus as an outgroup, demonstrated that, despite a number of unique 
features, SARS-CoV is most closely related t0 group 2 coronaviruses. 
Distant homologs of cellular RNA processing enzymes were identified in 
‘group 2 coronaviruses, with four of them being conserved in SARS-CoV. 
‘These newly recognized viral enzymes place the mechanism of corona- 
virus RNA synthesis in a completely ‘new perspective. Furthermore, 
together with previously described viral enzymes, they will be important 
targets for the design of antiviral strategies aimed at controlling the 
further spread of SARS-CoV, 
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Introduction 


Severe acute respiratory syndtome (SARS) is a 


“Abbreviations used SARS-CoV, severe acute 
respiratory syndrome coronavirus) ORF, open reading 
frame; sg, subgenomic; BCoY, bovine coronavirus ETeV, 
‘equine torovirus; HCoV, human coronavirus; MEY, 
‘mouse hepatitis corenavirus; PLI™, papair-like 
proteinase 1; IBV, avian infectious bronchitis 
coronavirus; SUD, SARS-CoV unique domain; TRS, 
ttanscripton-zegulating sequence; XendoU, poly(U)- 
specific endoribonuclease; EXON, 310-5! exonuclease; 
GMT, Sadenosylmethionine-dependent ribose 2-O- 

ethyltransferase; ADRP, adenosine diphosphate-ribose 
‘phosphatase; CPD, cyclic phosphodiesterase; 
snoRNA, small nacleolar RNA. 
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life-threatening form of atypical pneumonia that 
recently emerged in Guangdong Province, China 
‘A previously tunknown coronavirus Was isolated 
from SARS patients" and is considered the cause 
of this emerging respiratory disease. In an extra- 
ordinary effort, the full-length genome sequence 
of the SARS-coronavirus (SARS-CoV) was eluci- 
lated within weeks after the identification of this 
fovel pathogen and. published by the Michael 
Smith "Genome "Sciences Center (Vancouver, 
Canada!" Entrez Genomes accession number 
NC_OO7I8 (AYZAI1), the Centers for Disease 
Control and Prevention (Atlanta, USA,” GenBank 
accession number AY278741), and others. The 
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SARS.CoV genome is ~29.7 kb long. and contains 
1 open reading frames (ORFs) flanked by 3 and 
Sluntranslated regions of 265 and 342 nucleotides, 
respectively (Figure 1). Homologs of proteins 
conserved in all coronaviruses are encoded by the 
overlapping ORFs Ia and Ib, and by ORFs 2,4, 5, 
6 and 9a (Figure I; Tables 1 and 2) 
Coronaviruses*” are enveloped, positive-stranded 
RNA (+RNA) viruses, with single-stranded 
genome of between 27 kb and 31.5 kb, the largest 
Smong known RNA. viruses. The genomes. of 
coronaviruses and related viruses inthe order 
Nidovirales* are polycistronic and are expressed 
through a sophisticated combination of poorly 
understood regulatory mechanisms.” Coronavirus 
genome expression starts with the translation of 
two large replicase ORFs (la and 1b; Figure 1), 
whose coding capacity is about tvice that of the 
average complete +RNA virus genome. Via a —1 
‘Hbosomal frameshift," the OKFla_ polyprotein 
(pla; 4000 amino acid residues) canbe 


replicase / transcriptase 


extended with ORFlb-encoded sequences to yield 
2° =7000 amino acid residue pplab polyprotein 
Replicase polyprotein processing is carried out by 
fwwo oF three ORFla-encoded viral proteinases. 
The, processing produets ae a group of largcly 
uncharacterized (putative) replicative ‘enzymes, 
including an RNA-dependent RNA polymerase, 
an RNA helicase that is fused to a complex 
Neterminal Ze-finger, and a Zn-ribbon-containing 

apain-like proteinase." "* The replicase subunits 
Ere thought to. assemble into’ a vital replication 
‘complex that i targeted to eytoplasmic membranes 
by various membrane-associated viral proteins." 
Im addition to genome replication, the coronavirus 
replicase complex mediates the synthesis of an 
atensive nested set of subgenomic (sg) mRNAs 
(transcription) to express all ORFs downstream of 
CORFIb, which encode a Variety of structural and 
aceessory proteins’ The number and composition 
Df these S¢proximal ORFs vary” greatly among 
coronaviruses, but they always include genes for the 
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Figure 1. Overview of the SARS-CoV genome organization and expression. Comparison of the genome onganis- 
ations of SARS-CoV and bovine coronavirus (COV). The replicase genes are depicted, with ORFIa, ORFIb, and bo 
Smal Frameshift ste indicated. Arnows represent sites in the coreesponding replicase polyprotein that are cleaved 
by papain-like proteinases (orange) or the SC-ike cysteine proteinase (blue), Cleavage products ate provisionally 
ungbbvednapt rapt (oe sito Eble Inthe 3erhinal part of te genomes, homologous structural pron genes 
fre indicated in matching colors. Close-ups of two regons with major differences are shown (and ee the text nthe 
Netcrminal half of replicase ORFla, SARS-CoV lacks one of the PL domains indicated in orange/ green in BCoV) 
nd contains 4 unique insertion (SUID). In the region with stustaral and accessory protein genes, the locaton of the 
body TRSs involved in subgenomic RNA synthesis are indicated with red boxes (ace Figure Sand Hofmann eta.) 
The bottom part ofthe Figure illistates which parts ofthe genome are coneerved in the genus Corohriras and in 
the onder Nidovirales (the ORFla sequence of forovranes, which largely remains to be sequenced, could not be 
included), Furthermore, i is indicated fr which domains hoonologs have been identified in other KNA viruses and 
the cellular world. Enzymes for which structural data ate available are shown in blue. SUD, SARS-CoV nique 
domain; PL, papsinike cysteine peoteinse, SCL, 3C-ike cysteine proteinase, TA, transmembrane domain; 
‘ADR? adenosine diphosphatesbose l”phosphatase; EXON, 3-0-9 exonuclease; CL™, chymotrypsin-ike proteinase; 
Rak” ENacdependent RNA polymere, HELI, superfamily 1 haliase; XendoU, (homolog of) poly(Upspecic 
endarbonuclesse) 2-O-MT, Sadenowylmethionine-dependent ribose 2-O-methyltansterae; CPD, cyclic phorpho- 
Giesterase. Domains Ac, X, and Vare described by Zishul a l= and Gorbalenya a 
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‘Table 1. Predicted SARS-CoV replicase cleavage products and their mos of expression 
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structural proteins S, M, E and N, which drive eyto- 
plasmic virus assembly The mechanisms underlying 
the synthesis of genomic and subgenomic RNAs are 
poorly understood. To explain the composite struc: 
ture of the sg mRNAS, which are both 3” and 3- 
coterminal with the vital genome, several models 
have been put forward,” of which the one postulat- 
ing the discontinuous synthesis of negative-stranded 
sg templates for sg mRNA synthesis” has received 
Wide support recenty 

(On the basis of antigenic cross-reactivity, corona 
viruses were orginally life ino thee proups 
(Germed groups 1, 2, and 3). Subsequent 

hnylogeny-based “clustering of coronaviruses 
proved af Bret (almost) identieal with that based 
fn antigenic cross-reactivity” The same three 
clusters were evident upon analysis ofthe replicase 
region” which does not contribute to Virion an 
senicity. This indicated that diferent regions of the 
Eoronavirus genome have indeed co-evolved and 
that intergroup recombination has not played a 
prominent role in coronavirus evolution How 
Ever, the agreement between the tvo classilications 
is not perfect, as some coronaviruses are suffi 
ciently ‘Gifferent to not have antigenic cro 


reactivity with the established groups but close 
enough to cluster with ane of them (group 1) on 
the basis of sequence comparisons.” Consequently, 
these viruses were placed into (the expanded) 

roup 1. Here, we seler to coronavirus groups as 
Evolutionary clusters that rite viruses not neces 
arily having antigenic cross-reactivity 

Using the recently published SARS-CoV genome 
sequences” we provide insight into the evolution, 
Crganization and expression of SARS-CoV. The 
SARS-CoV genome and proteome were compared 
With those of other coronaviruses, distantly related 
Ridoviruses, and databases, and several of our 
predictions were verified experimentally. 


Results and Discussion 


SARS-CoV represents a lineage that has split 
off from the group 2 branch relatively late in 
coronavirus evolution 


‘To optimize our understanding of the SARS-CoV 
genome, we sought to infer the phylogenetic 
position of the novel agent relative to known 
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‘Table 2, Predicted SARS-CoV proteins expressed from 
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coronaviruses. Recent phylogenetic analyses of 
different SARS-CoV proteins using unrooted trees 
consistently showed that SARS-CoV does not seg- 
regate into any of the three currently established 
coronavirus groups.° These results were inter- 
preted as support for the classification of SARS- 
CoV as the prototype of a novel, fourth group of 
coronaviruses." However, in our opinion, the 


evidence leading to this conclusion was incon- 
lusive and allerative interpretations with SARS- 
CoV being an outlier in one of the established 
sroups, remained possible. This uncertainty can 
fe sesolved only through he reconstruction af 
coronavirus evolution from it oFigin using. a 
rooted phylogenetic tree, which is most reliable 
when an outgroup is included in the analysis. The 
closest known outgroup for coronaviruses are 
the toroviruses, which form a Separate genus in 
the same virux famil® The ORFIb part of the 
replicase and the two Virion proteins Suand M are 
homologous in coronaviruses and toroviruses.*-* 
Unfortunately however, the level of conservation 
of the Sand M protein genesis so low that we con- 
Sider only the phylogenetic analysis of replicase 
CORFI to be teul 

Consequently, to resolve the phylogenetic po 
ition of SARS-CoV, the equine torovirus (ETOV") 
Was included in our analysis, which was limited 
to replicase ORFlb,» the most conserved part of 
the genome. It should be noted, however, that the 
size of this genome segment (5500 nucleotides) 
approximates the combined size of the genes 
teneoding the four virion-associated proteins 8, M, 
E, and N_A fully resolved tree was obtained, with 
all branches supported in more than 960 out of 
1000 bootstrap trials (Figure 2). The topology of 
this tre suggest strongly that the SARS-COV line- 
age was an early splitoff from the group 2 branch, 
‘which occurred after the two bifurcations that gave 
se to. the three major coronavirus groups 
(Figure 2). Accordingly in two regions ofthe rp 
case ORFla polyprotein, nspl and one of the np 
domains, which differentiate the three coronavirus 
groups, SARS-CoV contains orthologs of domains 
that are unique for group 2 coronaviruses (see 
Figure SI of the Supplementary Materia). The 
published tinrooted trees for the virion proteins 
and SCL" are also compatible with this 
phylogeny, although formally we cannot exclude 
the occurrence of recombination with other corona: 
Viruses in very limited regions. In this respect, we 
‘would like to stress that the differences in the com: 
Position and arrangement of ORFs in the 3-proxi- 
fal gion of the genome (downstream of ORFIb; 
see Figure 1) between SARS-CoV and established 
proup 2 coronaviruses docs ot contradict the 
above results. Group 1 coronaviruses also differ in 
this region through the presence of unique 
called "accessory non-structural protein genes’. 
Some of these genes have been found to be 
clispensable for virus reproduction in tissue culture 
and/or animals" The fact that, apparently, they 
fan be acquired or lost easily in the course of 
‘evolution indicates that these genes can not be con 
sidered reliable group markers. 

Tn conclusion, SARS-CoV is distantly related to 
‘established group 2 coronaviruses, a relationship 
comparable fo that observed in group 1 between 
porcine epidemic diarrhoea coronavirus (PEDV) 
End human coronavieus 229% (HICoV-229E) on 
the one hand, and transmissible gastroenteritis 
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Figure 2. Phylogenetic analysis of comonavinis 
please genes. SARS-CoV replicave ORFIb amino acid 
juences (Entrez Genomes accession numer 
Ne ond7is (AV274119) were compared with those fom 
Vinuses representing the thive coronavirus subgroups 
nd the genus Tortus, Group 1: transmissible stro. 
enteritis virus (IGEV), NC. MDES06; human coronavinds 
RISE (HCOV-2098), "NC_ON2645; porcine epidemic 
diarbea virus (PEDV), NC_003436. Group 2 mouse 
Krepatin virus AS (MPIV-A59), NC. ODISHA bovine cor: 
onavirus (BCoV-Lun) AF391542, Group 3 infectious 
Bronchitis virus (IBV), strains Beautte (NC-OO14S1) 
and. LX4(AYZ840), Torovirus! equine torovirus 
(Eto), X52574. A snulple protein alignment ofthese 
Sequences was generated with the help of the Chos- 
{SIKLS2 program’ and was adjusted manually. Two 
sions of poor conservation were removed fram the 
ligmment, hich wax converted subsequently into the 
cleotide form. All columas containing gaps were 
removed. The resulting alignment contains the felling 
SARSCoV ‘sequences fused: 13423-13859, 14310 
18857 and 20076-2182. It included 5487 characters 
‘with 3207 of them being parsimony-informatve. Using 
the PALP program (version 4.0455) and parsimony cr 
terion, an exhaustive tee search ofthe 135,135 evaluated 
trees ientiied the best ew having a score of 10927 and 
the second best tee having a score of 1063; the worst 
tree had 9 score of 1,611. total of 1000 bootstrap tale 
were conducted using the parsimony. ceiterion anda 
branchvand-bound seach to generate a bootstrap Ss 
iajonity-rule consensus tee "The frequency of Occur 
‘ence of particular bifurcations in bootstraps is indicated 
Bt the nodes. Similar toes with sila high bootstrap 
Support above S60 were oblained ssi the NJ method 
that was applied to distance matrices obtained fr either 
Bucleotde'dr atin acid alignments (nt shown). 


coronavirus (TGEV) and related viruses on the 
other hand (Figure 2). Accordingly the lack of ant- 
fenic cross-reactivity observed between. distant 
frouprmotes Jn group 1 may, be observed 

crween SARS-CoV and the established group 2 
Viruses. Thus, SARS-CoV may be the fist ident- 


fied representative of a larger cluster that could be 
called Subgroup 2b, if the established group 2 coro- 
naviruses would be referred to as subgroup 2a, The 
2b cluster should include the immediate ancestor 
of SARS-CoV, which may circulate in the field. If 
Close relatives of SARS-CoV were to be identified 
fn animal hosts, the virus would represent the 
second example of a group 2 coronavirus that 
may have crossed the animal-human barrier. The 
first putative case is that of the bovine coronavirus 
(BCoV) and human coronavirus OC3 (HCoV- 
OC43), two viruses that are so closely related at 
the genetic level" that they can be considered to 
be the same virus species, 


‘Two proteinases are predicted to cleave the 
SARS-CoV replicase polyproteins into 16 
subunits, the largest of these having a unique 
domain organization 


‘A detailed comparison of the SARS-CoV rep! 
case with that of its closest Known relatives’ in 
group 2, mouse hepatitis coronavirus (MHY) and 
BCoV (Figure 1), revealed a replicase proteolytic 
processing scheme anc domain organization that, 
Iwith some notable exceptions (see below), proved 
to be typical for group 2 viruses." Using the con- 
served signatures of the cleavage sites recognized 
by coronavirus proteinases!" and their flank- 
ing sequences, we predict the generation of 16 
replicase subunits through proteolysis. mediated 
by 3CLP™ (11 cleavages) and PLI-* (three 
cleavages) (Figure 1 and Table 1). 

The most conspicuous differences between 
known group 2 coronaviruses and SARS-CoV 
were identified in nsp3, the largest replicase sub- 
unit that is encoded by ORFla (Table 1), Unlike all 
other coronaviruses, SARS-CoV does not have an 
ortholog of papain-like proteinase 1 (PLU; see 
close-up in Tigure 1," which was probably’ lost 
during evolution of this lineage. This observation 
implies that the three cleavages in the N-terminal 
half of ppla must all be performed by the con- 
served PL2™,"" a downstream-located paralog of 
PLI™: The ortholog of this proteinase appears £0 
dominate over PLI in HCoV-229E,” and is the 
only active PL? in avian infectious bronchitis cor- 
onavirus (IBV). Immediately upstream _ of 
PL2-, we identified a 375 amino acid residue 
“orphan domain” in SARS-CoV (called SUD for 
SARS-CoV unique domain; Figure 1), which is not 
present in other coronaviruses. The corresponding 
ORFla region differs profoundly among. group 1 
Coronaviriues In one ofthese vases (TEV), nd 
in the group 3 IBY, this region contains just a few 
amino acid residues, essentially fusing PL" to 
the upstream X domain. In contrast, HCoV-229F 
and PEDV share a conserved domain in this 
position. Interestingly, nsp3 also was the main site 
{of replicase differences between BCoV variants iso- 
lated from respiratory and intestinal samples from 
an animal that had died during an outbreak of 
fatal shipping pneumonia.” Due to the plausible 
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gure 3. SARS CoV slgenemic RNA synthesis, (A) Organization of ORF in the 3 od ofthe SARS CoV genome 
with predicted leader and body TRSs indicated by small boxes. The subgenomic mRNAs resulting from the use of 
these TRSs for leaderto-body fusion are depicted below, with mRNAs predicted to be functionally bicistronic 
Indicated with an asterisk (*). (B) Hybridization analysis of intracellular viral RNA from Vero cells infected with 
SARS-CoV, Frankfurt (Fr) and HKU-39819 (HK) isolates. See Materials and Methods for technical details. Oigo- 
nucleotides complementary to sequences from the SARS-CoV leader sequence and to a region in the genomic ¥ end. 
both recognized a set of nine RNA species (the genome (RNA) and eight subgenomic RNAs) confirming the presence 
fof common 5 and 3° sequences. RNA from Vero cells infected with avian infectious bronchitis virus (IBV), which 
produces only five subgenomic mRNAs of known sizes" was run in the same gel and used as a size marker 
{C) Model for nidovirus subgenomic RNA synthesis by discontinuous extension af minus strands.” Whereas 
{genome replication relies on continuous minus strand synthesis (antigenome), subgenomic minus strands would be 
produced by attenuation of nascent strand synthesis at a body TRS (red ba), followed by translocation of the nascent 
Etrand to the leader TRS in the genomic template. Following base-pairing between the boxy TRS complement at the 
3'end of the minus strand and the leader TRS, RNA synthesis would resume to complete the subgenomic mints 


Strand that would then serve as template for the transcription of subgenomic mRNAS 


rmultifunctionalty of nsp3, which may be involved 
inthe control of subgenomic mRNA synthesis 
the gross intemal “rearrangements “and point 
‘mutations in this protein may have pleiotropic 
tifecis) on SARS-COV’ properties, induding is 
pathogenic potential 


SARS-CoV produces eight subgenomic 
mRNAs to express the ORFs located in the 
3-proximal part of the genome 


Ina striking parallel with the unique features of 
rsp3, the 3-proximal part of the SARS-CoV 
genome contains five ORFs (6, 7a, 7b, 8a and 8b) 
that are not present in established group 2 corona- 
viruses and for which no obvious homologs could 
be identified upon sequence comparison. Further- 
more, SARS-CoV lacks counterparts for two genes 
inserted between replicase ORFb and the § gene 
in subgroup 2a viruses (see the close-up in 
Figure 1)” All these ORFs (from 2 to 9b) are p 

dicted to be expressed from sg mRNAs in SARS- 
CoY. In members of the genus Coranevirus and the 
related family’ Avterivirdae, all sg mRNAS are 3- 
coterminal with the viral genome, and contain a 
common 5} leader sequence that is identical with 
that of the genome.'”"”” The fusion of the leader 
to the coding. part (or “body”) of each of the sg 


RNAs involves a discontinuous step in RNA syn- 
thesis, which is currently believed to oecur during 
minus strand synthesis, thus producing composite 
subgenomic negative-stranded templates for sx 
MRNA. synthesis (Figure 3(C)).""" Leader-tor 
body joining is guided by a base-pairing 
interaction involving conserved transcription- 
regulating sequences (TRSs; also previously 
termed “intergenic sequences. (IGSs)" in. corona- 
Viruses), which are found at the 9 end of the 
genomic leader (leader TRS) and at the 5’ end of 
teach of the sg RNA bodies (body TRSs), often 
located exactly between two genes, but sometimes 
located within the coding sequence of an upstream 
gene (Figures 1 and 3(A)), 

In the SARS-CoV genome we readily identified a 
potential leader TRS (3-CUAAACGAACUUU-9) 
that has a 6-11 nucleotides match with a number 
of sequences in the 3! end of the genome, many of 
which are positioned immediately upstream of 
Viral genes (Figure 3(A)). As recognized also by 
others," the sequence 5-ACGAAG-Y is, 
absolutely conserved and can be considered the 
core of the SARS-CoV TRS. Based on the SARS- 
‘CoV sequence with the largest 5-terminal segment 
{accession number AY278741'), the SARS-CoV 
leader sequence is (at least) 72° nucleotides long, 
Similar toveg. that of BCoV, with which it has a 
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striking 20 out of 21 nucleotides match immedi 
ately upstream of the leader TRS (5-GAUCUCUU 
GUAGAUCUGUUC-3). On the basis of the 
location of putative body TRSs, the synthesis of 
ine mRNAs by SARSCoV was expected: the 
genomic mRNA (RNA1) and eight subgenomic 
mRNAs with sizes of approximately 8.4, 46, 38, 
35, 30, 26, 2.1 and 1.8 kb (including 5’ leader and 
3’ poly(A}-tail). However, in the first published 
experimental analysis, of the SARS-CoV specific 
mRNAs generated in infected Vero cells, the 
mRNAs could be 


thesis of only five viral 
confirmed.” 

“To investigate SARS-CoV RNA synthesis in more 
detail, Vero cells were infected with SARS-CoV 
isolates Frankfurt and HKU-39849," and intra- 
cellular RNA was analyzed by hybridization with 
oligonucleotide probes complementary to a part of 
the J leader sequence and a sequence just 
upstream of the 3) poly(A) fail. The coronavirus 
IBY," which also replicates in Vero cells, was used 
fas control and size marker, As illustrated in 
Figure 3(B), the genomic RNA and all eight pre- 
dicted subgenomic transcripts were detected with 
both SARS-CoV probes, confirming the fact that 
these RNAs contain both common 9-terminal and 
common terminal sequences. Remarkably, a 
slight mobility shift was observed for RNAs 7 and 
larger of the Frankfurt-l isolate. The subsequent 
sequence analysis of this virus revealed that this 
was due to a 45 nt in-frame deletion in ORI7,* 
probably the first documented example of SARS- 
CoV genetic adaptation to cell culture conditions. 
The confirmation of leader-body fusion sites of the 
SARS-CoV subgenomic mRNAs will be published 
elsewhere. Remarkably, up to four of the eight 
SARS-CoV subgenomic mRNAs (3, 7, 8, and 9) 
may be functionally bicistronic (Table 2), as 
observed occasionally for other coronavirus sub- 
genomic mRNAs.” 


‘The replicase of coronaviruses includes a 
variety of putative RNA-processing enzymes 


‘The production of a complex and diverse set of 
RNA molecules by nidoviruses (including SARS- 
CoV) is linked to an unparalleled complexity of 
their giant replicase, which contains a variety of 
(putative) enzymatic functions and a number of 
completely uncharacterized domains (Figure 1).* 
We have initiated the characterization of corona- 
virus replicase by comparative genomics,” and 
have regularly updated this analysis through 
recent years)" Our continuing analysis has now 
identified distant coronavirus homologs of not less 
than five cellular enzymes that are associated with 
RNA processing (Figure 4): poly(U)-specific endo- 
ribonuclease (XendoU"), a 3-to-5) exonuclease 
(ExoN) that belongs to the DEDD superfamily, 
S-adenosylmethionine-dependent ribose 2-0- 
rmethyltransferase (2-O-MT) of the Rem) family, 
adenosine diphosphate-ribose V'-phosphatase 
(ADRP"), and cyclic phosphodiesterase (CPD). 


In the SARS-CoV proteome, conserved domains 
presumably associated with’ these activities were 
mapped (from the N to C terminus) to the X 
domain’ of nsp3 (ADRP), the Neterminal domain 
fof nspl4 (EXON), a “nidovirus-specific” replic 
domain in the C-terminal part of -nspl5 
(XendoU}, and aspl6 @-O-M1). The CPD-related 
domain is not conserved in SARS-CoV, but was 
identified in the product of ORF2" of established 
soup 2 coronaviruses and in the very C-terminal 
jomain of the torovirus ORFla polyprotein,” as 
well as in some double-stranded RNA rotaviruses. 

‘The conservation in the ExoN, 2-O-MT and 
CPD-related domains of nidoviruses includes the 
catalytic and other active-site residues identified 
in the prototype cellular enzymes. Although the 
active-site residues of the ADRP and XendoU 
families are yet to be characterized, the most con- 
served amino acids of these families are found in 
their putative nidovirus homologs. Some of the 
riidovirus domains may contain unique and con- 
served additional domains. For instance, we noted 
that the nidovirus ExoN homologs contain an 
additional conserved domain resembling a mono- 
rhuclear Zn-finger (Figure 4(B)) between the univer- 
sally conserved blocks I and Il, which inelude the 
catalytic residues (two Asp’ and one Glu)" 
Another Za-finger-like module has been inserted 
between blocks Il and III in the ExoN homolog of 
roniviruses, a subset of nidoviruses (data not 
shown}. Our combined observations indicate that 
the nidovirus homologs of these cellular RNA pro- 
cessing enzymes must be enzymatically active, 
although they may have evolved to act on specific 
(and unique) substrates or have additional unique 
components, 

‘The newly predicted enzymes could be involved 
in the metabolism of virus and /or cellular RNAS. 
For instance, the 2-O-MT activity could be used to 
produce the 5/-cap of viral mRNAs, as was demon- 
Strated for a homologous flavivirus. enzyme. 
Based on a parallel with some cellular DNA- 
processing homologs, like exonuclease T* and the 
exonuclease domain of DNA polymerases," it is 
tempting to speculate on a link between the ExoN 
activity and RNA proofreading, repaie, and/or 
recombination. The first two activities are not 
Known in RNA viruses, and recombination com- 
monly proceeds through the copy-choice 
mechanism with RdRp switching templates. t0 
produce chimeric nascent chains.” However, due 
to the extreme sizes of their giant genomes, corona- 
‘iruses may cifer Irom other RNA virdsew and 
share an unprecedented similarity with DNA- 
based lifeforms in the mechanisms of genome bio- 
synthesis and maintenance. If confirmed, these 
‘unusual properties would explain the preliminary 
reports on the resistance of SARS-CoV to ribav' 

a drug that was shown to force other RNA viruses 
into “error catastrophe”.” The experimental veri 

cation of these predictions will be an important 
Step in inereasing, our understanding of the func- 
tional roles these putative enzymes play in the 
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Figure 4. Sequence alignments of protein families that include cellular enzymes involved in RNA. processing and 
theif nvr homologs Our i ‘comparative sequence analysis (ase Materials and: Methods) teveaied 
Statistically significant relationship between Functionally uncharacterized proteins (domains) of nidoviruses, including 
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far (.Z and AEG, unpublished results). This 
ian "ar Epes oes an 
ent Wine wate ce nal ied 
pn arena a ne 
of mature tRNA™ and the production of intron- 
See th aa ea 
from its host premRNA® (Figure 5(A)). In the 


first pathway, XendoU initiates a cascade of poorly 
characterized endo- and exonuclease reactions that 
may involve ExoN, a homolog of the yeast Repop 
exosome component," ultimately leading to the 
production of mature U16 and US6 snoRNAs, Sub- 
Sequently, these snoRNAS may be utilized in 
diverse FRNA processing events involving aucleo- 
tide methylation by fibrillarin, a 2-O-MT" and 
assisted by helicase(s).” Strikingly, the homologs 
of three cellular enzymes from this pathway, 
tencoded in the replicases of all nidoviruses except 
for arteriviruses, are genetically clustered in a 
single protein block (aspl4-nsp16) immediately 
downstream of the RNAchelicase _(aspl3) 
(Figures 1 and 4). Because of the proximity of 
these four domains to each other, their expression 
must be tightly coordinated at the level of 3CL? 
proteolysis and by the upstream ORFla/ORFlb 
ribosomal frameshift signal 

In the other pathway, which involves tRNA-pro- 
cessing, the utilization of a 2-phosphate group of 
« splicing intermediate involves the conversion of 
adenosine diphosphate ribose 1'-2" cyclic phos- 
phate (Appr > p) by CPD’ into adenosine diphos- 
phate ribose I4phosphate (Appr-l'-p), of which 
the phosphate group may be further processed by 


SARS-CoV, and five protein families that include enzymes involved in two nuclear RNA processing pathways: intron 
txcsion to produce matite RNA” and the production of intton-encoded bok C/D small nucleslar KNA (anoRNA) 
from its host pre-mRNA (Figure 5) Shown are alignments for key regions of a ew selected members ofthe folowing 
roups of enzymes: (A) XendaU family (8) ExoN family (C) 2-C-M family; (D) CPD family and (E) ADRP family 
"These protein families may be known sso under other names. Cellular homologs, not necessarily including proteins 
involved in the discussed RNA processing pathvsays, are listed in the top segonent of cach ligament and ious 
protean the bom segment he CPD lay, long ith rou 2 Conair erent, prot of 
Fooviruses (double-stranded RNA viruses), which were identified inthis stady, are iste. In bath segment, resides 
fre highlighted independently: black for absolutely conserved residues and different shades of rey to indicate 
diferent levels of conservation; amino acid similarity groups used were: @)D, E, N, Q i) ST (i) KR (i) EW, 
Snd (0) 1, L. M, V Positions cccupicd by identical of similar resides inal profeins under comparison are indicied 
‘with an asterisk (+) and colon (), tespestively in the interseyment gow. For the ExoN family, three motifs conserved 
inthe DEDD wuprtamly and Zinger aig forthe EN fay ae nated: Daas ace names fr 
Sidovinus genome sequences: SARSCoV, Entrez Genomes accension number NC.OW7I8 (AV27419), MEV-AS, 
NC-O01846; BCoV-Lun, "AFS91SI2; HCoV-2296, NC-002615, IBV-B, _NC.OOLS1; PEDV, NC_OD3136, TGEV, 
NC=O2306; equine torovinas (EIoV), XS2574) equine arteritis virus (EAV), X33459, porcine reproductive. and 
respiratory syndrome virus (PRRSV), M9622; gillassocated virus (GAN), AFZ27196. Abbrevations and NCBI protein 
database ID number or SwissProt names ofthe temaining protein sequences ae: (A) Npun 0542, hypothetical protein 
Sf Nostocpunctfar, ZP-00106I9%, Poli mB, pancreatic protein of Paracas almncens, BAAR, Coleg Ppl, 
placental protein Ii-ike precursor of Cosmos elegans, NP" 492590); Xlaev endoU, endo protein of Xenopus aris, 
CADIS; pplb, ORFliencaded part of nidovinus replicase polyprotein lab. (B) Yeast PAN2, PAbdependent 
poly(A)-apecific ribonuclease subunit PAN2 of Saccharntces cogs, PSSDIO, Mjoge DPOS, DNA polymerase Il 
aic-type, containing exonuclease domain, of Mycoplea gota, PA7277; Baca DING, prcbable ATP-dependent 
FRalcan? dnc homolog, confaning exonuclease dovpain, of Bais stl, PSAG84, Ecol DPSE, DNA polymerase I, 
epsilon chain, containing exonuclease domain, of Escherichia sal, PUS007 (PDB: 153 and 1); Boo RNT, exorbo- 
clase of Escherciar eal, 30014. (C) Heap AKA, Ackinase anchoring protein 18 gamma of Hom sapiens, 
‘AAFISI06; Athal CPDI, putative CPDI of Araidopns habans, CAAIG?S0, Athal CPD2, pulative CPD2 of Aradopes 
thaliana, CAAIS7SI; yeast YC59, bypothetical 267 KDa protein of yeast, P3334; Ecol LIGT, 2-57 RNA ligase of 
Escurichi coy, P37035, m2, nonstructural protein (ORFZencaded) of the coronaviruses HCoV-O43 (AAATEI7A, 
[CoV Quebec (PI8517), and MEIV-AS9 (PIN758); ETON pla, C-terminal fragenent of EToV pla, 11237; HRW VPS 
YP3 of human rotavinis, BAASS964, AROV VPS, VPS of avian rotavirus PO-13, BAA2SI2S. (D) Ecol 0177, putative 
polyprotein of Eschrishs ca, ACTAI29; sap Y1268a, KLAAI268 protein of Homo spens, BASES; Hisap LAL, 
Fisone macroH2AL1 of Homo spies, AACHSI34; yeast YMIN7, hypothetical 321 kDa protein of yeast, 04299; yeast 
YN2, hypothetical 12.9 kDa protein of yeast, PS8218.(E) Yeast YBRI, putative ribosomal RNA methyltransterase 
GRNA (uridine-2-O-)methylfransterse) of yeost, PAS238, yeast SPBI, putave FRNA methyltansterae SPS of 
Yess, P25592; yeast YGNS, putative ribosomal RNA methyltransferase YCL13«e (RNA (ardine-2-O-}methyltranster- 
3c) of yeast, P53125; Ecol FS}, cell division protein of Eacriclia ca, NP-S17646, 
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Figure 5. Nidoviruses encode 
homologs of cellular enzymes 
involved in RNA. processing 
(A) The cellular pathways for pro 
ceming of pres snoRNA and 
PEeIRNA splicing are summarized, 
‘with relevant enzymatic activities 
indicated. For details, see the text 


SD am @ 


omologs a 


Homologs of the highlighted 
enzymes have been identified in 


nidoviruses (see also Figure 1 and 
the’ text) (B). Table. summarizing 
the conservation of homologs of 
the cellular enzymes presumably 
involved in RNA. processing. in 
SARS-CoV and different nidavirus 
groups 


an ADRR Both these activities may drive the 
production of mature UNA. Although the nido- 
Virus homologs of CPD and ADRP remain to be 
characterized, they are not under the control of 
the ORFla/ORFib ribosomal frameshift signal 
(Figute 1) and may thus, unlike the ORFTb-encoded 
enzymes, be produced in larger quantities 

“The nidavirus homologs of the five RNA proces- 
sing enzymes discussed above may interfere with 
these or similar cellular RNA processing, pathways 
to reprogram the col forthe Benefit of inas repro 
duction, It seems even more conceivable that they, 
alone oF in concert with other enzymes like the 
RARp or helicase, are involved disectly in vieal 
RNA’ ‘Synthesis, particularly in tanscription, 
Which, in an apparent parallel with snoRNA™ 
driven processes," is guided by conserved oligo- 
pucleotide base-pairing interactions (Figure 3(C)) 
The viral enzymes, like their cellular counterparts, 
might be part of separate pathways or, alterna. 
tively, cooperate in a single pathway in which the 
XendoUl, EXON and 2-O-MT homologs provide 
RNA specificity, and the CPD and ADRP homologs 
modulate the pace through processing of com: 
pomcle) caning 2 pheaphaie gue tn te 
Fespect, we note that both the Xendol!/ExoN/2 
O-MT and CPD/ADRP cellular pathways start 
with an endoribonuclease-mediated cleavage to 
produce molecules) with 2--cyelic. phosphate 
fermini (Figure 5), indicating the structural basis 
for possible cooperation of the coronavirus homo- 
logs of these enzymes in a single pathivay. The 
expected functional hierarchy of the five putative 
idovirus enzymes (Figure 3(A)) is supported by 
their corresponding evolutionary conservation, 


with the XendoU homolog being absolutely 
conserved and the CPD homolog being least con- 
served among nidoviruses (Figure 5(B). 


Concluding Remarks 


‘The availability and comparative analysis of the 
SARSCaV genome and proteome set the stage for 
the extensive biological characterization of this 
ernerging pathogen an he development of ant 
SARECoV strategies. Our conclusion that SARS. 
CoV is distantly felated to group 2 coronaviruses 
(Figure 2) implies that viruses from this group, in 
Particular the extensively studied mouse hepatitis 
Sirus and iis derivatives lacking ‘non-essential, 
{CPDuke and HE genes, may be the best available 
models fr both in stro and ine studi, in pat 
fiulae where the synthesis of viral macromolecles 
and the structure and function of the replication 
Complex are involved. A-detailed “comparative 
characterization “of the BCoV/HCoV-OCA3 pair 
may provide invaluable insights into the processes 
ff adaptation of a non-human coronavirus to. 
human host, which should be highly relevant to 
understanding the emergence of SARSCoV. The 
SARS-CoV genome (Figure 1) lacks genes that are 
common in group 2 viruses, Uke PLir= and CPD- 
like and HE genes, but encodes a number of 
unique protein sequences, underlining the ability 
of coronaviruses tthe gross evolution. The com 
parative studies presented here have tentatively 
ented both known and novel vital enzymes 
(Figures 1 and'5), mont of which may be involved 
in RNA processing and have homologs of which 


Evolution, Genome and Proteome of SARS-Coronavinis 


1001 


the tertiary structure has been solved (Figure 1). 
Intriguing” parallels have been drawn between 
these putative viral enzymes and characterized, 
but distant cellular homologs that will guide the 
funetional dissection of the replicases of SARS- 
CoV and related viruses and may put the mechan- 
‘gm of coronavirus RNA synthesis in a completely 
new perspective. The newly described putative 
enzymes of SARS-CoV double the list of potential 
targets for the design of antiviral strategies aimed 
at controlling this emerging virus infection.°* 


Materials and Methods 
‘Analysis of intracellular SARS-CoV RNA 


Vero cells were infected with SARS-CoV (Frankfurt 1 
or HKU-39849) at an MOI of 101 or were mock infected. 
‘At the onset of eytopathogenic effect (approximately 10 
hours post infection) intracellular RNA was isolated by 
Call fyi for ton animutes at room temperature with 5s 
(6/¥) lithium. dodecyl slate in LET buffer (10 mM 
“is HCI (pH 78,100 9M LiCl, 1 mM EDTA), contan- 
ing 20 ug/ml of proteinase K. After abearing of the 
cellular DNA using a syringe, lysates were incubsted at 
42°C for 15 minutos, extracted with phenol (pH 40) 
Sand chloroform, and RNA was ethanol-precipitted. The 
RNAs were separated in denaturing I (w/¥) agarose 
feb conning 22M formaldehyde and Mops buter 
Go mM Mops Godium salt) (PH 7), 5mM sodium acet- 
Ste, 1mM EDTA) Dried. gels were used for direct 
Hybridization with” "P-labeled oligonucleotides 
SARSVGMI (7-CGAGGITGGTIGGCITITECIG) and 
SARSVOI2 (S-CACATGGGGATAGCACTAG), which 
are complementary to sequences in the SARS-CoW leader 

yuence and the genomic 3 end, respectively: After 
iybridzation, gels were analyzed ting a Personal FX 
Molecular Insager and Quantity One’ software. (both 
from Bio-Rad). 


Methods for bioinformatics 


Genpeptides, Conserved domain (CD)" and. protein 
family’ (Pam) databases were used inthis sad: 
“Amino aid sequence alignments were generated is 
ClustaiXLS1 nd Dialiged™ programs assisted by 
Blosum postionspectic matrices" nd were processed 
for presentation sing. CeneDoc.”hlulipe sequence 
aligiments were converted into hidden Markov model 
(GANIM) profes using HMMER201 software.” Sequence 
Satabases vere searched in default mode, unless sated 
otherwise, using the HMMER201 package” and a 
family of Blast programs.” 

“The expectation values of similarity (E) of 05 oF 
loweer for Blast searches and 0.1 or lower for: HMMER- 
tmediated searches were considered to be statistically 
Sagnificant Database searches with nidovirds proteins 
(ables I and 2) and thee alignments were condctd in 
Cite mic antl no tow Eoeclogs were en 
fied, Also, sequences that were identibed below the 
threshold during the lst iteration Were used to iitite 
‘eciprocal searches that might have rested in new sig 
Pificant matches. This approach worked for all protein 
Families described here, except forthe identification of 
the relationship between the nidovirus ExoN family and 
cellular DEDD superfamily, which is known to be 


extremely divers. In this ater case, using the MAST 
program,” we found. a. strong match (= 36") 
Eetwoen the most conserved moti Ill of 3 DEDD protein 
Sn s cancer block of the EXON faenly that faclitated 
the identification ofthe two ther motifs in the nidoviras 
Proteing having a non-ypical intermotif spacing par 
ally occupied by" Zn-inger(s) (oce the text and 
Figure 4). Furthermore, we observed an approximately 
4 mer, seleciveinrese of The glial simanty 
item the ExoN family and DEDD proteins, ater the 
coronavirus sequences were modified artificially by 
femoving putative Zn-ingers tht are not present i the 
DEDD proteins. In the HMMER-mediated searches of 
Si sequences using this Zn-inger decent ExoN 
family at a query, numerous DED proteins were 
‘etreved immediately after the nidovirs proteins, start 
ing with £~ OSL The relatively poor statics ofthese 
Iie were due to the faikie by HIMMIER to align all 
thee mot. 

Caster phylogenetic tees wore reconstructed sing 
the neighbour Joining algorithm described By Saitou © 
Nei with the Kimura cretion," and were evaluated 
‘with 1000 bootstrap tril, as implemented in the Clas- 
TSIXLSI program. Parsimonious trees were generated 
thing exhaustive search and evalusted with bootstrap 
bronchvand-bound search using a UNIX version of the 
PAUP" 110435 program that is inchuded sn the CCC 
Wisconsin Package programs. The resulting tees were 
‘isualized using the TreeView program.” 
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